The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation. However, MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks. In this work, we show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions. We demonstrate that even though label smoothing only causes a slight change in the token-level, the sequence-level distribution is highly skewed. We coin the issue \emph{distributional over-smoothness}. To address this issue, we propose a simple and effective method, Distributional Cooling MBR (DC-MBR), which manipulates the entropy of output distributions by tuning down the Softmax temperature. We theoretically prove the equivalence between pre-tuning label smoothing factor and distributional cooling. Experiments on NMT benchmarks validate that distributional cooling improves MBR's efficiency and effectiveness in various settings.
translated by 谷歌翻译
Mainstream image caption models are usually two-stage captioners, i.e., calculating object features by pre-trained detector, and feeding them into a language model to generate text descriptions. However, such an operation will cause a task-based information gap to decrease the performance, since the object features in detection task are suboptimal representation and cannot provide all necessary information for subsequent text generation. Besides, object features are usually represented by the last layer features that lose the local details of input images. In this paper, we propose a novel One-Stage Image Captioner (OSIC) with dynamic multi-sight learning, which directly transforms input image into descriptive sentences in one stage. As a result, the task-based information gap can be greatly reduced. To obtain rich features, we use the Swin Transformer to calculate multi-level features, and then feed them into a novel dynamic multi-sight embedding module to exploit both global structure and local texture of input images. To enhance the global modeling of encoder for caption, we propose a new dual-dimensional refining module to non-locally model the interaction of the embedded features. Finally, OSIC can obtain rich and useful information to improve the image caption task. Extensive comparisons on benchmark MS-COCO dataset verified the superior performance of our method.
translated by 谷歌翻译
太阳能动力学天文台(SDO)是NASA多光谱十年的长达任务,每天都在日常产生来自Sun的观测数据的trabytes,以证明机器学习方法的潜力并铺路未来深空任务计划的方式。特别是,在最近的几项研究中提出了使用图像到图像翻译实际上产生极端超紫罗兰通道的想法,这是一种增强任务较少通道的提高任务的方法,并且由于低下链接而减轻了挑战。深空的速率。本文通过关注四个通道和基于编码器的建筑的排列来研究这种深度学习方法的潜力和局限性,并特别注意太阳表面的形态特征和亮度如何影响神经网络预测。在这项工作中,我们想回答以下问题:可以将通过图像到图像翻译产生的太阳电晕的合成图像用于太阳的科学研究吗?分析强调,神经网络在计数率(像素强度)上产生高质量的图像,通常可以在1%误差范围内跨通道跨通道重现协方差。但是,模型性能在极高的能量事件(如耀斑)的对应关系中大大减少,我们认为原因与此类事件的稀有性有关,这对模型训练构成了挑战。
translated by 谷歌翻译
由于深度神经网络的开发,尤其是对于最近开发的无监督的JND代模型,对公正的显着差异(JND)建模做出了重大改进。但是,他们有一个主要的缺点,即在现实世界信号域而不是在人脑中的感知结构域中评估了生成的JND。当在这两个域中评估JND时,存在明显的差异,因为在现实世界中的视觉信号在通过人类视觉系统(HVS)传递到大脑之前已编码。因此,我们提出了一个受HVS启发的信号降解网络进行JND估计。为了实现这一目标,我们仔细分析了JND主观观察中的HVS感知过程,以获得相关的见解,然后设计受HVS启发的信号降解(HVS-SD)网络,以表示HVS中的信号降解。一方面,知识渊博的HVS-SD使我们能够评估感知域中的JND。另一方面,它提供了更准确的先验信息,以更好地指导JND生成。此外,考虑到合理的JND不应导致视觉注意力转移的要求,提出了视觉注意力丧失以控制JND的生成。实验结果表明,所提出的方法实现了SOTA性能,以准确估计HVS的冗余性。源代码将在https://github.com/jianjin008/hvs-sd-jnd上找到。
translated by 谷歌翻译
我们使用2-wasserstein空间的几何特性在一组概率度量之间发展了一个投影概念。它是为一般的多元概率度量而设计的,在计算上有效地实施,并在常规设置中提供了独特的解决方案。这个想法是使用广义的大地测量学处理瓦斯汀空间的常规切线锥。它的结构和计算属性使该方法适用于各种设置,从因果推断到对象数据的分析。估计因果效应的应用将合成控制的概念概括为具有个体级异质性的多元数据,以及一种在所有时间段内共同估算最佳权重的方法。
translated by 谷歌翻译
从单个RGB图像中估算3D相互作用的手姿势对于理解人类行为至关重要。与大多数直接预测两只相互作用手的3D姿势的先前作品不同,我们建议分解具有挑战性的相互作用姿势估计任务并分别估算每只手的姿势。这样,就可以直接利用单手姿势估计系统的最新研究进度。然而,由于(1)严重的手部阻塞和(2)手的歧义性,手动姿势估计在相互作用的情况下非常具有挑战性。为了应对这两个挑战,我们提出了一种新型的手部划分和去除(HDR)框架,以执行手部斜切和脱离分散术的去除。我们还提出了第一个称为Amodal intredhand数据集(AIH)的大规模合成Amodal手数据集,以促进模型培训并促进相关研究的开发。实验表明,所提出的方法显着优于先前的最新相互作用姿势估计方法。代码和数据可在https://github.com/menghao666/hdr上找到。
translated by 谷歌翻译
基于图形卷积的方法已成功应用于同质图上的表示学习,其中具有相同标签或相似属性的节点往往相互连接。由于这些方法使用的图形卷积网络(GCN)的同义假设,它们不适合异质图,其中具有不同标记或不同属性的节点往往相邻。几种方法试图解决这个异质问题,但是它们没有改变GCN的基本聚合机制,因为它们依靠求和操作员来汇总邻近节点的信息,这隐含地遵守同质假设。在这里,我们介绍了一种新颖的聚合机制,并开发了基于随机步行聚集的图形神经网络(称为RAW-GNN)方法。提出的方法将随机步行策略与图神经网络集成在一起。新方法利用广度优先的随机步行搜索来捕获同质信息和深度优先搜索以收集异性信息。它用基于路径的社区取代了传统社区,并基于经常性神经网络引入了新的基于路径的聚合器。这些设计使RAW-GNN适用于同质图和异质图。广泛的实验结果表明,新方法在各种同质图和异质图上实现了最先进的性能。
translated by 谷歌翻译
神经网络已广泛应用于垃圾邮件和网络钓鱼检测,入侵预防和恶意软件检测等安全应用程序。但是,这种黑盒方法通常在应用中具有不确定性和不良的解释性。此外,神经网络本身通常容易受到对抗攻击的影响。由于这些原因,人们对可信赖和严格的方法有很高的需求来验证神经网络模型的鲁棒性。对抗性的鲁棒性在处理恶意操纵输入时涉及神经网络的可靠性,是安全和机器学习中最热门的主题之一。在这项工作中,我们在神经网络的对抗性鲁棒性验证中调查了现有文献,并在机器学习,安全和软件工程领域收集了39项多元化研究工作。我们系统地分析了它们的方法,包括如何制定鲁棒性,使用哪种验证技术以及每种技术的优势和局限性。我们从正式验证的角度提供分类学,以全面理解该主题。我们根据财产规范,减少问题和推理策略对现有技术进行分类。我们还展示了使用样本模型在现有研究中应用的代表性技术。最后,我们讨论了未来研究的开放问题。
translated by 谷歌翻译
阿尔茨海默氏病(AD)的早期诊断对于促进预防性护理以延迟进一步发展至关重要。本文介绍了建立在痴呆症Pitt copus上的基于最新的构象识别系统以自动检测的开发。通过纳入一组有目的设计的建模功能,包括基于域搜索的自动配置特异性构象异构体超参数除外,还包括基于速度扰动和基于规格的数据增强训练的基线构象体系统可显着改善。使用学习隐藏单位贡献(LHUC)的细粒度老年人的适应性;以及与混合TDNN系统的基于两次通行的跨系统逆转。在48位老年人的评估数据上获得了总体单词错误率(相对34.8%)的总体单词错误率(相对34.8%)。使用最终系统的识别输出来提取文本特征,获得了最佳的基于语音识别的AD检测精度为91.7%。
translated by 谷歌翻译